Enriching Statistical Translation Models Using a Domain-Independent Multilingual Lexical Knowledge Base

نویسندگان

  • Miguel García
  • Jesús Giménez
  • Lluís Màrquez i Villodre
چکیده

This paper presents a method for improving phrase-based Statistical Machine Translation systems by enriching the original translation model with information derived from a multilingual lexical knowledge base. The method proposed exploits the Multilingual Central Repository (a group of linked WordNets from different languages), as a domain-independent knowledge database, to provide translation models with new possible translations for a large set of lexical tokens. Translation probabilities for these tokens are estimated using a set of simple heuristics based on WordNet topology and local context. During decoding, these probabilities are softly integrated so they can interact with other statistical models. We have applied this type of domain-independent translation modeling to several translation tasks obtaining a moderate but significant improvement in translation quality consistently according to a number of standard automatic evaluation metrics. This improvement is especially remarkable when we move to a very different domain, such as the translation of Biblical texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilingual Lexical Representation

The approach to multilingual lexical representation developed as part of the ACQUILEX Lexical Knowledge Base (LKB) discussed with specific reference to complex translation equivalence. The treatment described provides a lexicalist account of translation mismatches in terms of translation links which capture cross-linguistic generalizations across sets of semantically related lexical items, and ...

متن کامل

Improving Machine Translation through Linked Data

With the ever increasing availability of linked multilingual lexical resources, there is a renewed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a resource. Unknown words and ambiguous translat...

متن کامل

Towards Universal Multilingual Knowledge Bases

Lexical, ontological, as well as encyclopedic knowledge is increasingly being encoded in machine-readable form. This paper deals with knowledge representation in multilingual settings. It begins by proposing a generic graph-based knowledge base framework, and then, in three case studies, explains how preexisting knowledge can be cast into this framework. The first case study involves enriching ...

متن کامل

The Habanera Lexical Knowledge Base Management System

Habanera is a multipurpose multilingual lexical knowledge base that is developed at CRL to be used as a central repository of multilingual lexical data. The knowledge base contains a set of dictionaries and relations between entries, within a dictionary (e.g., synonymy) as well as between entries of different dictionaries (e.g., translation). The format of monolingual lexical entries is left re...

متن کامل

A multilingual ontology matcher

State-of-the-art multilingual ontology matchers use machine translation to reduce the problem to the monolingual case. We investigate an alternative, self-contained solution based on semantic matching where labels are parsed by multilingual natural language processing and then matched using a language-independent knowledge base acting as an interlingua. As the method relies on the availability ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009